## [1] "Names of variables "
## [1] "fixed.acidity" "volatile.acidity" "citric.acid"
## [4] "residual.sugar" "chlorides" "free.sulfur.dioxide"
## [7] "total.sulfur.dioxide" "density" "pH"
## [10] "sulphates" "alcohol" "quality"
## [1] "Dimensions of wine data"
## [1] 1599 12
## [1] "Structure of wine data"
## 'data.frame': 1599 obs. of 12 variables:
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
## [1] "Summary of Redwine data"
## fixed.acidity volatile.acidity citric.acid residual.sugar
## Min. : 4.60 Min. :0.1200 Min. :0.000 Min. : 0.900
## 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090 1st Qu.: 1.900
## Median : 7.90 Median :0.5200 Median :0.260 Median : 2.200
## Mean : 8.32 Mean :0.5278 Mean :0.271 Mean : 2.539
## 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420 3rd Qu.: 2.600
## Max. :15.90 Max. :1.5800 Max. :1.000 Max. :15.500
## chlorides free.sulfur.dioxide total.sulfur.dioxide
## Min. :0.01200 Min. : 1.00 Min. : 6.00
## 1st Qu.:0.07000 1st Qu.: 7.00 1st Qu.: 22.00
## Median :0.07900 Median :14.00 Median : 38.00
## Mean :0.08747 Mean :15.87 Mean : 46.47
## 3rd Qu.:0.09000 3rd Qu.:21.00 3rd Qu.: 62.00
## Max. :0.61100 Max. :72.00 Max. :289.00
## density pH sulphates alcohol
## Min. :0.9901 Min. :2.740 Min. :0.3300 Min. : 8.40
## 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500 1st Qu.: 9.50
## Median :0.9968 Median :3.310 Median :0.6200 Median :10.20
## Mean :0.9967 Mean :3.311 Mean :0.6581 Mean :10.42
## 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300 3rd Qu.:11.10
## Max. :1.0037 Max. :4.010 Max. :2.0000 Max. :14.90
## quality
## Min. :3.000
## 1st Qu.:5.000
## Median :6.000
## Mean :5.636
## 3rd Qu.:6.000
## Max. :8.000
Quality ranges from 0 to 10,but in data minimum is 3 and maximum is 8, which means that most of the wines we will look at in the analysis are average wines.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 5.636 6.000 8.000
Data set is regarding the wine quality and several chemical componets that it contains.there ae 1599 samples of wine with 10 variables(fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfer dioxide, density, pH, sulphates, alcohol, quality) of type numeric and 1 rating factor quality of type int.
Quality is the main feature of insterest ,given by 3 wine experts according to their knowledge and experience.Quality ranges from 0 to 10 but our data has least quality of 3 and highest quality of 8. Lets find out what are the main deciding factors for high quality wine.
There can lot more features since in real world so many factors affect the quality of Red wine. >Type of grapes used >flavor (like combination of different ingredients) >Color >taste(sweet,sour,bitter,etc) >total cost from the ingredients to final production of wine(since cost matters since high quality wine with less cost really matters)
Yes , i made total.acidity and combined.sulphur.dioxide, which may show some unseen trends.
Volatile acidity is having a bimodal distribution and Citric acid has quite a long-tail distribution.But it is not a Normal Distribution. the data was already tidy so there was no requirement for any adjustment.
## [1] "Correlation among the variables"
## volatile.acidity total.sulfur.dioxide density
## -0.39055778 -0.18510029 -0.17491923
## chlorides pH free.sulfur.dioxide
## -0.12890656 -0.05773139 -0.05065606
## residual.sugar fixed.acidity citric.acid
## 0.01373164 0.12405165 0.22637251
## sulphates alcohol quality
## 0.25139708 0.47616632 1.00000000
Observing the correlation, alcohol and volatile acidity, have a higher correlation with the quality of wine.Suphates and citric acid are also correlated with the quality of wine. Residual sugar has almost no correlation with quality.
Quality being the feature of interest,the correlation between quality and each different variable in the dataset are examined.Quality of wine is directly proportional to the alcohol content and volatile acidity and inversely proportional to density,total sulfur dioxide and chlorides.
pH and volatile acidity are positively correleated, Higher the pH value means less acidity, but from plots a higher volatile acidity means more acidity. Density of wine has high negative correlation with the amount of alcohol in wine. I was expecting a close relation between sulphur and sulphur dioxide,there seems no relation with correlation coefficient of 0.04.
correlation of quality with other variables volatile.acidity total.sulfur.dioxide density chlorides -0.39055778 -0.18510029 -0.17491923 -0.12890656 pH free.sulfur.dioxide residual.sugar fixed.acidity -0.05773139 -0.05065606 0.01373164 0.12405165 citric.acid sulphates alcohol quality 0.22637251 0.25139708 0.47616632 1.00000000 From the correlations we can clearly see alcohol positiely and volatile.acidity negitively are having a strong relation with quality. And density and fixed acidity have a strong correlation.